{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3",
   "language": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nltk\n",
    "from tools import show_subtitle"
   ]
  },
  {
   "source": [
    "# Chap9 建立基于特征的文法（语法）\n",
    "\n",
    "本章的学习目标：\n",
    "\n",
    "1.  怎样用特征扩展无关上下文文法的框架，以获得对文法类别和产生式的更细粒度的控制？\n",
    "2.  特征结构的主要形式化发生是什么？如何使用它们来计算？\n",
    "3.  基于特征的文法能够获得哪些语言模式和文法结构？"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## 9.1 语法特征(也叫文法特征)\n",
    "\n",
    "在基于规则的上下文语法中，特征——值偶对被称为特征结构。"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 字典存储特征以及特征的值\n",
    "\n",
    "# -   'CAT' 表示语法类别；'ORTH'：表示正字法（正词法，拼写规则）\n",
    "# -   'REF' 表示 'kim' 的指示物；'REL'：表示 'chase' 表示的关系\n",
    "# -   'AGT' 表示施事（agent）角色；'PAT'： 表示受事（patient）角色\n",
    "kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}\n",
    "chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}\n",
    "chase['AGT'] = 'sbj'  # 'sbj'（主语）作为占位符\n",
    "chase['PAT'] = 'obj'  # 'obj'（宾语）作为占位符\n",
    "lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "['Kim', 'chased', 'Lee']"
      ]
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "source": [
    "sent = \"Kim chased Lee\"\n",
    "tokens = sent.split()\n",
    "tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lex2fs(word):\n",
    "    for fs in [kim, lee, chase]:\n",
    "        if fs['ORTH'] == word:\n",
    "            return fs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "({'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'},\n",
       " {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase', 'AGT': 'sbj', 'PAT': 'obj'},\n",
       " {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'})"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "source": [
    "subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])\n",
    "subj,verb,obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "ORTH  => chased\nREL   => chase\nAGT   => k\nPAT   => l\n"
     ]
    }
   ],
   "source": [
    "verb['AGT'] = subj['REF']  # agent of 'chase' is Kim\n",
    "verb['PAT'] = obj['REF']  # patient of 'chase' is Lee\n",
    "for k in ['ORTH', 'REL', 'AGT', 'PAT']:  # check featstruct of 'chase'\n",
    "    print(\"%-5s => %s\" % (k, verb[k]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "{'CAT': 'V',\n",
       " 'ORTH': 'surprised',\n",
       " 'REL': 'surprise',\n",
       " 'SRC': 'SBJ',\n",
       " 'EXP': 'obj'}"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "source": [
    "# 'SRC'：表示源事（source）的角色；'EXP'：表示体验者（experiencer）的角色\n",
    "surprise = {'CAT': 'V', 'ORTH': 'surprised', 'REL': 'surprise', 'SRC': 'SBJ', 'EXP': 'obj'}\n",
    "surprise\n",
    "# 特征结构是非常强大的，特征的额外表现力（Ref：Sec 9.3）开辟了用于描述语言结构复杂性的可能。"
   ]
  },
  {
   "source": [
    "### 9.1.1 句法协议\n",
    "\n",
    "协议（agreement）：动词的形态属性和主语名词短语的句法属性一起变化的过程。\n",
    "\n",
    "表9-1 英语规则动词的协议范式\n"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "### 9.1.2 使用属性和约束"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "% start S\n# ###################\n# Grammar Productions\n# ###################\n# S expansion productions\nS -> NP[NUM=?n] VP[NUM=?n]\n# NP expansion productions\nNP[NUM=?n] -> N[NUM=?n] \nNP[NUM=?n] -> PropN[NUM=?n] \nNP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\nNP[NUM=pl] -> N[NUM=pl] \n# VP expansion productions\nVP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]\nVP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP\n# ###################\n# Lexical Productions\n# ###################\nDet[NUM=sg] -> 'this' | 'every'\nDet[NUM=pl] -> 'these' | 'all'\nDet -> 'the' | 'some' | 'several'\nPropN[NUM=sg]-> 'Kim' | 'Jody'\nN[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'\nN[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' \nIV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'\nTV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'\nIV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'\nTV[TENSE=pres, NUM=pl] -> 'see' | 'like'\nIV[TENSE=past] -> 'disappeared' | 'walked'\nTV[TENSE=past] -> 'saw' | 'liked'\n"
     ]
    }
   ],
   "source": [
    "# Ex9-1 基于特征的语法的例子\n",
    "nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "|.Kim .like.chil.|\nLeaf Init Rule:\n|[----]    .    .| [0:1] 'Kim'\n|.    [----]    .| [1:2] 'likes'\n|.    .    [----]| [2:3] 'children'\nFeature Bottom Up Predict Combine Rule:\n|[----]    .    .| [0:1] PropN[NUM='sg'] -> 'Kim' *\nFeature Bottom Up Predict Combine Rule:\n|[----]    .    .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] *\nFeature Bottom Up Predict Combine Rule:\n|[---->    .    .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}\nFeature Bottom Up Predict Combine Rule:\n|.    [----]    .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' *\nFeature Bottom Up Predict Combine Rule:\n|.    [---->    .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}\nFeature Bottom Up Predict Combine Rule:\n|.    .    [----]| [2:3] N[NUM='pl'] -> 'children' *\nFeature Bottom Up Predict Combine Rule:\n|.    .    [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] *\nFeature Bottom Up Predict Combine Rule:\n|.    .    [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}\nFeature Single Edge Fundamental Rule:\n|.    [---------]| [1:3] VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] *\nFeature Single Edge Fundamental Rule:\n|[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] *\n(S[]\n  (NP[NUM='sg'] (PropN[NUM='sg'] Kim))\n  (VP[NUM='sg', TENSE='pres']\n    (TV[NUM='sg', TENSE='pres'] likes)\n    (NP[NUM='pl'] (N[NUM='pl'] children))))\n"
     ]
    }
   ],
   "source": [
    "# Ex9-2 跳跃基于特征的图表分析器\n",
    "tokens = 'Kim likes children'.split()\n",
    "from nltk import load_parser\n",
    "\n",
    "cp = load_parser('grammars/book_grammars/feat0.fcfg', trace = 2)\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "source": [
    "### 9.1.3 术语\n",
    "\n",
    "简单的值通常称为原子。\n",
    "\n",
    "-   原子值的一种特殊情况是布尔值。\n",
    "\n",
    "AGR是一个复杂值。\n",
    "\n",
    "属性——值矩阵（Attribute-Value Matrix，AVM）\n"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## 9.2 处理特征结构"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[NUM='sg', TENSE='past']"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "source": [
    "# 特征结构的构建；两个不同特征结构的统一（合一）运算。\n",
    "fs1 = nltk.FeatStruct(TENSE = 'past', NUM = 'sg')\n",
    "fs1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "fs1['GND']= fem\n--------------- >fs1< ---------------\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[CASE='acc', GND='fem', NUM='pl', PER=3]"
      ]
     },
     "metadata": {},
     "execution_count": 11
    }
   ],
   "source": [
    "fs1 = nltk.FeatStruct(PER = 3, NUM = 'pl', GND = 'fem')\n",
    "print(\"fs1['GND']=\",fs1['GND'])\n",
    "fs1['CASE'] = 'acc'\n",
    "show_subtitle(\"fs1\")\n",
    "fs1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--------------- >fs2['AGR']< ---------------\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[CASE='acc', GND='fem', NUM='pl', PER=3]"
      ]
     },
     "metadata": {},
     "execution_count": 12
    }
   ],
   "source": [
    "fs2 = nltk.FeatStruct(POS = 'N', AGR = fs1)\n",
    "show_subtitle(\"fs2['AGR']\")\n",
    "fs2['AGR']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--------------- >fs2< ---------------\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[AGR=[CASE='acc', GND='fem', NUM='pl', PER=3], POS='N']"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "source": [
    "show_subtitle(\"fs2\")\n",
    "fs2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[AGR=[GND='fem', NUM='pl', PER=3], POS='N']"
      ]
     },
     "metadata": {},
     "execution_count": 14
    }
   ],
   "source": [
    "nltk.FeatStruct(\"[POS='N',AGR=[PER=3, NUM='pl', GND='fem']]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[AGE=33, NAME='Lee', TELNO='13918181818']"
      ]
     },
     "metadata": {},
     "execution_count": 15
    }
   ],
   "source": [
    "# 特征结构也可以用来表示其他数据\n",
    "nltk.FeatStruct(NAME = 'Lee', TELNO = '13918181818', AGE = 33)"
   ]
  },
  {
   "source": [
    "特征结构也可以使用有向无环图（Directed Acyclic Graph，DAG）来表示，相当于前面的AVM\n",
    "\n",
    "DAG可以使用结构共享或者重入来表示两条路径具有相同的值，即它们是等价的。"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "[ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], Name='Lee', SPOUSE=[ADDRESS->(1), NAME='Kim']]"
      ]
     },
     "metadata": {},
     "execution_count": 16
    }
   ],
   "source": [
    "# 括号()里面的整数称为标记或者同指标志（coindex）。\n",
    "nltk.FeatStruct(\"\"\"[Name='Lee', ADDRESS=(1)[NUMBER=74,STREET='rue Pascal'],SPOUSE=[NAME='Kim',ADDRESS->(1)]]\"\"\")\n"
   ]
  },
  {
   "source": [
    "### 9.2.1 包含（蕴涵） 和 统一（合一） （Ref：《自然语言处理综论》Ch15）\n",
    "\n",
    "一般的特征结构包含（蕴涵）特殊的特征结构\n",
    "\n",
    "合并两个特征结构的信息称为统一（合一），统一（合一）运算是对称的。"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--------------- >fs1.unify(fs2)< ---------------\n[ CITY   = 'Paris'       ]\n[ NUMBER = 74            ]\n[ STREET = 'rule Pascal' ]\n--------------- >fs2.unify(fs1)< ---------------\n[ CITY   = 'Paris'       ]\n[ NUMBER = 74            ]\n[ STREET = 'rule Pascal' ]\n--------------- >fs1.unify(fs2) == fs2.unify(fs1)< ---------------\nTrue\n"
     ]
    }
   ],
   "source": [
    "# 合一的相容运算\n",
    "fs1 = nltk.FeatStruct(NUMBER = 74, STREET = 'rule Pascal')\n",
    "fs2 = nltk.FeatStruct(CITY = 'Paris')\n",
    "show_subtitle(\"fs1.unify(fs2)\")\n",
    "print(fs1.unify(fs2))\n",
    "show_subtitle(\"fs2.unify(fs1)\")\n",
    "print(fs2.unify(fs1))\n",
    "show_subtitle(\"fs1.unify(fs2) == fs2.unify(fs1)\")\n",
    "print(fs1.unify(fs2) == fs2.unify(fs1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "# 合一的失败运算\n",
    "fs0 = nltk.FeatStruct(A = 'a')\n",
    "fs1 = nltk.FeatStruct(A = 'b')\n",
    "fs2 = fs0.unify(fs1)\n",
    "print(fs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[ ADDRESS = [ NUMBER = 74           ]               ]\n[           [ STREET = 'rue Pascal' ]               ]\n[                                                   ]\n[ Name    = 'Lee'                                   ]\n[                                                   ]\n[           [           [ CITY   = 'Paris'      ] ] ]\n[           [ ADDRESS = [ NUMBER = 74           ] ] ]\n[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]\n[           [                                     ] ]\n[           [ NAME    = 'Kim'                     ] ]\n"
     ]
    }
   ],
   "source": [
    "fs0 = nltk.FeatStruct(\"[SPOUSE=[ADDRESS=[CITY=Paris]]]\")\n",
    "# 无同指标志的特征结构的相容运算\n",
    "fs1 = nltk.FeatStruct(\"\"\"[Name='Lee', \n",
    "ADDRESS=[NUMBER=74,STREET='rue Pascal'],\n",
    "SPOUSE=[NAME='Kim',\n",
    "ADDRESS=[NUMBER=74,STREET='rue Pascal']]]\"\"\")\n",
    "print(fs0.unify(fs1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[               [ CITY   = 'Paris'      ] ]\n[ ADDRESS = (1) [ NUMBER = 74           ] ]\n[               [ STREET = 'rue Pascal' ] ]\n[                                         ]\n[ Name    = 'Lee'                         ]\n[                                         ]\n[ SPOUSE  = [ ADDRESS -> (1)  ]           ]\n[           [ NAME    = 'Kim' ]           ]\n"
     ]
    }
   ],
   "source": [
    "# 有同指标志的特征结构的相容运算\n",
    "fs2 = nltk.FeatStruct(\"\"\"[Name='Lee', \n",
    "ADDRESS=(1)[NUMBER=74,STREET='rue Pascal'],\n",
    "SPOUSE=[NAME='Kim',\n",
    "ADDRESS->(1)]]\"\"\")\n",
    "print(fs0.unify(fs2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "--------------- >fs2< ---------------\n[ ADDRESS1 = ?x ]\n[ ADDRESS2 = ?x ]\n[ ADDRESS3 = ?y ]\n[ ADDRESS4 = ?y ]\n--------------- >fs2.unify(fs1)< ---------------\n[ ADDRESS1 = (1) [ NUMBER = 74           ] ]\n[                [ STREET = 'rue Pascal' ] ]\n[                                          ]\n[ ADDRESS2 -> (1)                          ]\n[                                          ]\n[ ADDRESS3 = (2) [ NAME = 'Lee' ]          ]\n[                                          ]\n[ ADDRESS4 -> (2)                          ]\n"
     ]
    }
   ],
   "source": [
    "# 使用变量?x表示的特征结构的相容运算\n",
    "fs1 = nltk.FeatStruct(\"[ADDRESS1=[NUMBER=74, STREET='rue Pascal'], ADDRESS4=[NAME='Lee']]\")\n",
    "fs2 = nltk.FeatStruct(\"[ADDRESS1=?x, ADDRESS2=?x, ADDRESS3=?y, ADDRESS4=?y]\")\n",
    "show_subtitle(\"fs2\")\n",
    "print(fs2)\n",
    "show_subtitle(\"fs2.unify(fs1)\")\n",
    "print(fs2.unify(fs1))"
   ]
  },
  {
   "source": [
    "## 9.3 扩展基于特征的文法（语法）\n",
    "### 9.3.1 子类别（次范畴化）\n",
    "\n",
    "广义短语结构语法（Generalized Phrase Structure Grammar，GPSG），\n",
    "\n",
    "允许词汇类别支持SUBCAT特征（表明项目所属的子类别）\n",
    "\n",
    "### 9.3.2 回顾核心词概念\n",
    "\n",
    "### 9.3.3 助动词和倒装\n"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "### 9.3.4 无限制依赖成分"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "% start S\n# ###################\n# Grammar Productions\n# ###################\nS[-INV] -> NP VP\nS[-INV]/?x -> NP VP/?x\nS[-INV] -> NP S/NP\nS[-INV] -> Adv[+NEG] S[+INV]\nS[+INV] -> V[+AUX] NP VP\nS[+INV]/?x -> V[+AUX] NP VP/?x\nSBar -> Comp S[-INV]\nSBar/?x -> Comp S[-INV]/?x\nVP -> V[SUBCAT=intrans, -AUX]\nVP -> V[SUBCAT=trans, -AUX] NP\nVP/?x -> V[SUBCAT=trans, -AUX] NP/?x\nVP -> V[SUBCAT=clause, -AUX] SBar\nVP/?x -> V[SUBCAT=clause, -AUX] SBar/?x\nVP -> V[+AUX] VP\nVP/?x -> V[+AUX] VP/?x\n# ###################\n# Lexical Productions\n# ###################\nV[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'\nV[SUBCAT=trans, -AUX] -> 'see' | 'like'\nV[SUBCAT=clause, -AUX] -> 'say' | 'claim'\nV[+AUX] -> 'do' | 'can'\nNP[-WH] -> 'you' | 'cats'\nNP[+WH] -> 'who'\nAdv[+NEG] -> 'rarely' | 'never'\nNP/NP ->\nComp -> 'that'\n"
     ]
    }
   ],
   "source": [
    "# 具有倒装从句和长距离依赖的产生式的语法，使用斜线类别\n",
    "nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "(S[-INV]\n  (NP[+WH] who)\n  (S[+INV]/NP[]\n    (V[+AUX] do)\n    (NP[-WH] you)\n    (VP[]/NP[]\n      (V[-AUX, SUBCAT='clause'] claim)\n      (SBar[]/NP[]\n        (Comp[] that)\n        (S[-INV]/NP[]\n          (NP[-WH] you)\n          (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'who do you claim that you like'.split()\n",
    "from nltk import load_parser\n",
    "\n",
    "cp = load_parser('grammars/book_grammars/feat1.fcfg')\n",
    "tree=[]\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "(S[-INV]\n  (NP[-WH] you)\n  (VP[]\n    (V[-AUX, SUBCAT='clause'] claim)\n    (SBar[]\n      (Comp[] that)\n      (S[-INV]\n        (NP[-WH] you)\n        (VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats))))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'you claim that you like cats'.split()\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "(S[-INV]\n  (Adv[+NEG] rarely)\n  (S[+INV]\n    (V[+AUX] do)\n    (NP[-WH] you)\n    (VP[] (V[-AUX, SUBCAT='intrans'] sing))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'rarely do you sing'.split()\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "source": [
    "### 9.3.5 德语中的格和性别"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "% start S\n# Grammar Productions\nS -> NP[CASE=nom, AGR=?a] VP[AGR=?a]\nNP[CASE=?c, AGR=?a] -> PRO[CASE=?c, AGR=?a]\nNP[CASE=?c, AGR=?a] -> Det[CASE=?c, AGR=?a] N[CASE=?c, AGR=?a]\nVP[AGR=?a] -> IV[AGR=?a]\nVP[AGR=?a] -> TV[OBJCASE=?c, AGR=?a] NP[CASE=?c]\n# Lexical Productions\n# Singular determiners\n# masc\nDet[CASE=nom, AGR=[GND=masc,PER=3,NUM=sg]] -> 'der' \nDet[CASE=dat, AGR=[GND=masc,PER=3,NUM=sg]] -> 'dem'\nDet[CASE=acc, AGR=[GND=masc,PER=3,NUM=sg]] -> 'den'\n# fem\nDet[CASE=nom, AGR=[GND=fem,PER=3,NUM=sg]] -> 'die' \nDet[CASE=dat, AGR=[GND=fem,PER=3,NUM=sg]] -> 'der'\nDet[CASE=acc, AGR=[GND=fem,PER=3,NUM=sg]] -> 'die' \n# Plural determiners\nDet[CASE=nom, AGR=[PER=3,NUM=pl]] -> 'die' \nDet[CASE=dat, AGR=[PER=3,NUM=pl]] -> 'den' \nDet[CASE=acc, AGR=[PER=3,NUM=pl]] -> 'die' \n# Nouns\nN[AGR=[GND=masc,PER=3,NUM=sg]] -> 'Hund'\nN[CASE=nom, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunde'\nN[CASE=dat, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunden'\nN[CASE=acc, AGR=[GND=masc,PER=3,NUM=pl]] -> 'Hunde'\nN[AGR=[GND=fem,PER=3,NUM=sg]] -> 'Katze'\nN[AGR=[GND=fem,PER=3,NUM=pl]] -> 'Katzen'\n# Pronouns\nPRO[CASE=nom, AGR=[PER=1,NUM=sg]] -> 'ich'\nPRO[CASE=acc, AGR=[PER=1,NUM=sg]] -> 'mich'\nPRO[CASE=dat, AGR=[PER=1,NUM=sg]] -> 'mir'\nPRO[CASE=nom, AGR=[PER=2,NUM=sg]] -> 'du'\nPRO[CASE=nom, AGR=[PER=3,NUM=sg]] -> 'er' | 'sie' | 'es'\nPRO[CASE=nom, AGR=[PER=1,NUM=pl]] -> 'wir'\nPRO[CASE=acc, AGR=[PER=1,NUM=pl]] -> 'uns'\nPRO[CASE=dat, AGR=[PER=1,NUM=pl]] -> 'uns'\nPRO[CASE=nom, AGR=[PER=2,NUM=pl]] -> 'ihr'\nPRO[CASE=nom, AGR=[PER=3,NUM=pl]] -> 'sie'\n# Verbs\nIV[AGR=[NUM=sg,PER=1]] -> 'komme'\nIV[AGR=[NUM=sg,PER=2]] -> 'kommst'\nIV[AGR=[NUM=sg,PER=3]] -> 'kommt'\nIV[AGR=[NUM=pl, PER=1]] -> 'kommen'\nIV[AGR=[NUM=pl, PER=2]] -> 'kommt'\nIV[AGR=[NUM=pl, PER=3]] -> 'kommen'\nTV[OBJCASE=acc, AGR=[NUM=sg,PER=1]] -> 'sehe' | 'mag'\nTV[OBJCASE=acc, AGR=[NUM=sg,PER=2]] -> 'siehst' | 'magst'\nTV[OBJCASE=acc, AGR=[NUM=sg,PER=3]] -> 'sieht' | 'mag'\nTV[OBJCASE=dat, AGR=[NUM=sg,PER=1]] -> 'folge' | 'helfe'\nTV[OBJCASE=dat, AGR=[NUM=sg,PER=2]] -> 'folgst' | 'hilfst'\nTV[OBJCASE=dat, AGR=[NUM=sg,PER=3]] -> 'folgt' | 'hilft'\nTV[OBJCASE=acc, AGR=[NUM=pl,PER=1]] -> 'sehen' | 'moegen'\nTV[OBJCASE=acc, AGR=[NUM=pl,PER=2]] -> 'sieht' | 'moegt'\nTV[OBJCASE=acc, AGR=[NUM=pl,PER=3]] -> 'sehen' | 'moegen'\nTV[OBJCASE=dat, AGR=[NUM=pl,PER=1]] -> 'folgen' | 'helfen'\nTV[OBJCASE=dat, AGR=[NUM=pl,PER=2]] -> 'folgt' | 'helft'\nTV[OBJCASE=dat, AGR=[NUM=pl,PER=3]] -> 'folgen' | 'helfen'\n"
     ]
    }
   ],
   "source": [
    "# Ex9-4 基于特征的语法的例子（表示带格的协议的相互作用）\n",
    "nltk.data.show_cfg('grammars/book_grammars/german.fcfg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "(S[]\n  (NP[AGR=[NUM='sg', PER=1], CASE='nom']\n    (PRO[AGR=[NUM='sg', PER=1], CASE='nom'] ich))\n  (VP[AGR=[NUM='sg', PER=1]]\n    (TV[AGR=[NUM='sg', PER=1], OBJCASE='dat'] folge)\n    (NP[AGR=[GND='fem', NUM='pl', PER=3], CASE='dat']\n      (Det[AGR=[NUM='pl', PER=3], CASE='dat'] den)\n      (N[AGR=[GND='fem', NUM='pl', PER=3]] Katzen))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'ich folge den Katzen'.split()\n",
    "cp = load_parser('grammars/book_grammars/german.fcfg')\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "|.ich.fol.den.Kat.|\nLeaf Init Rule:\n|[---]   .   .   .| [0:1] 'ich'\n|.   [---]   .   .| [1:2] 'folge'\n|.   .   [---]   .| [2:3] 'den'\n|.   .   .   [---]| [3:4] 'Katze'\nFeature Bottom Up Predict Combine Rule:\n|[---]   .   .   .| [0:1] PRO[AGR=[NUM='sg', PER=1], CASE='nom'] -> 'ich' *\nFeature Bottom Up Predict Combine Rule:\n|[---]   .   .   .| [0:1] NP[AGR=[NUM='sg', PER=1], CASE='nom'] -> PRO[AGR=[NUM='sg', PER=1], CASE='nom'] *\nFeature Bottom Up Predict Combine Rule:\n|[--->   .   .   .| [0:1] S[] -> NP[AGR=?a, CASE='nom'] * VP[AGR=?a] {?a: [NUM='sg', PER=1]}\nFeature Bottom Up Predict Combine Rule:\n|.   [---]   .   .| [1:2] TV[AGR=[NUM='sg', PER=1], OBJCASE='dat'] -> 'folge' *\nFeature Bottom Up Predict Combine Rule:\n|.   [--->   .   .| [1:2] VP[AGR=?a] -> TV[AGR=?a, OBJCASE=?c] * NP[CASE=?c] {?a: [NUM='sg', PER=1], ?c: 'dat'}\nFeature Bottom Up Predict Combine Rule:\n|.   .   [---]   .| [2:3] Det[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] -> 'den' *\n|.   .   [---]   .| [2:3] Det[AGR=[NUM='pl', PER=3], CASE='dat'] -> 'den' *\nFeature Bottom Up Predict Combine Rule:\n|.   .   [--->   .| [2:3] NP[AGR=?a, CASE=?c] -> Det[AGR=?a, CASE=?c] * N[AGR=?a, CASE=?c] {?a: [NUM='pl', PER=3], ?c: 'dat'}\nFeature Bottom Up Predict Combine Rule:\n|.   .   [--->   .| [2:3] NP[AGR=?a, CASE=?c] -> Det[AGR=?a, CASE=?c] * N[AGR=?a, CASE=?c] {?a: [GND='masc', NUM='sg', PER=3], ?c: 'acc'}\nFeature Bottom Up Predict Combine Rule:\n|.   .   .   [---]| [3:4] N[AGR=[GND='fem', NUM='sg', PER=3]] -> 'Katze' *\n"
     ]
    }
   ],
   "source": [
    "tokens='ich folge den Katze'.split()\n",
    "cp = load_parser('grammars/book_grammars/german.fcfg',trace=2)\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "source": [
    "## 9.4 小结\n",
    "*   上下文无关语法的传统分类是原子符号。特征结构的重要作用之一是捕捉精细的区分，否则将需要数量翻倍的原子类别\n",
    "*   通过使用特征值的变量，可以表达出语法产生式中的限制，使得不同的特征规格之间相互依赖\n",
    "*   在词汇层面指定固定的特征值，并且限制短语中的特征值，使其与“孩子”的对应值相统一（？）\n",
    "*   特征值可以是原子的，也可以是复杂的。原子值的特定类别是布尔值\n",
    "*   两个特征可以共享一个值（原子的或者复杂的）。具有共享值的结构被称为重入。共享值被表示AVM中的数字索引（或者标记）\n",
    "*   特征结构中的路径是特征元组，对应着从图底部开始的弧序列上的标签。\n",
    "*   如果两条路径共享一个值，那么这两条路径是等价的。\n",
    "*   包含的特征结构是偏序的。特征结构A蕴涵特征结构B，说明特征结构A更加一般，特征结构B更加特征\n",
    "*   如果统一（合一）运算在特征结构中指定了一条路径，那么同时指定了所有与这条路径等价的其他路径。\n",
    "*   使用特征结构对大量的语言学现象进行简洁的分析，包括：\n",
    "    *   动词子类别\n",
    "    *   倒装结构\n",
    "    *   无限制依赖结构\n",
    "    *   格支配"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## 9.5 深入阅读\n",
    "\n",
    "理论语言学中使用特征是捕捉语音的音素特征\n",
    "\n",
    "计算语言学提出了语言功能可以被属性——值结构统一捕获\n",
    "\n",
    "词汇功能语法表示语法关系和与成分结构短语关联的谓词参数结构"
   ],
   "cell_type": "markdown",
   "metadata": {}
  }
 ]
}